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Linkage mapping generally localizes disease genes 
to 1- to 2-cM regions of chromosomes. In theory, fur- 
ther refinement of location can be achieved by popula- 
tion-based studies of linkage disequilibrium between 
disease locus alleles and alleles at adjacent markers. 
One approach to localization, dubbed simple disequi- 
librium mapping, is to determine the relative location 
of the disease locus by plotting disequilibrium values 
against marker locations. We investigate the simple 
mapping properties of five disequilibrium measures, 
the correlation coefficient A, Kcwontin's D' y thc robust 
formulation of the population attributable risk 6\ 
Yule's Q 9 and Kaplan and Weir's proportional differ- 
ence d under the assumption of initial complete dis- 
equilibrium between disease and marker loci. The 
studies indicate that S is a superior measure for fine 
mapping because it is directly related to the recombi- 
nation fraction between the disease and the marker 
loci, and it is invariant when disease haplotypes are 
sampled at a rate higher than their population fre- 
quencies, as in a case -control study. I)' yields results 
comparable to those of $ in many realistic settings. Of 
the remaining three measures, Q 9 A, and d 9 Q yields 
the best results. From simulations of short-term evolu- 
tion, all measures show some sensitivity to marker al- 
lele frequencies; however, as predicted by analytic re- 
sults, Q y A, and d exhibit the greatest sensitivity to 
variation in marker allele frequencies across loci. 

f? 1995 Academic Pre**, Inc. 



INTRODUCTION 

Linkage or pedigree analysis remains the fundamen- 
tal paradigm by which genetic epidemiologists map loci 
contributing to inherited disorders (Ott, 1991). In fact, 
numerous genes having a major effect on human dis- 
eases have been mapped to within 1 cM using such 
analyses. Further refinement in location using family 
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studies is difficult because recombinations are rarely 
observed even within Lhe large pedigrees that would 
be required for finer mapping of these loci (Boehnke, 
1994). 

Consequently, it will often be the ease that linkage 
mapping of disease loci leaves about 1 Mb of DNA to 
be searched by the molecular geneticist- which can be 
a daunting amount unless there aTC natural candidate 
genes in the region (e.g., vShiang et cd.. 1994). Any 
method that narrows the amount of DNA to be 
searched would be important. One such method uses 
linkage disequilibrium to refine the location of the dis- 
ease locus. Conditional on disease status, the linkage 
disequilibrium between a mutant allele at a disease 
locus and other alleles at flanking markers is complete 
(sensu Qegg ei ul. f 1976) at the instant the mutation 
occurs. When evolutionary forces can be ignored, in- 
cluding marker and disease locus mutation, any decay 
in disequilibrium is due solely to recombination. Under 
this ideal scenario, and provided that the time since 
the disease mutation is not too long, the pattern or 
curve of disequilibrium between disease and marker 
loci will exhibit a single maximum that occurs at the 
disease locus. Consequently, the amount of linkage dis- 
equilibrium between a disease allele and closely linked 
genetic markers may yield valuable information re- 
garding the location of the disease gene. 

We term ihis method of linkage disequilibrium map- 
ping simple disequilibrium mapping because it uses 
only the pattern of pairwise disequilibrium values 
across loci to infer the approximate location of the dis- 
ease locus. Tt is the method most commonly applied, 
although it is clear that other methods of disequilib- 
rium mapping may make more efficient use of the da ta. 
I'or instance, Hill aad Weir (1994) advance a maximum 
likelihood method for disequilibrium between two loci, 
a disease locus and marker locus, assuming that the 
population itself is in a steady state of constant popula- 
tion size and selective pressures (or neutrality). When 
these assumptions are met, their method will have 
some very desirable properties for localizing disease 
genes. Hastbacka <>t cd. (1992) suggest another method 
of line mapping using linkage disequilibrium, which is 
formulated specifically foT recently founded popula- 
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lions. Again, this method depends on certain assump- 
tions about the evolutionary process, specifically expo- 
nential growth and a single disease-producing chromo- 
some in the founding population, as well as knowledge 
of when the mutation first occurred. For a refinement 
of this method, see Kaplan et aL (1995). Regardless of 
the competing methods, simple disequilibrium map- 
ping is a valid descriptive tool thai molecular biologists 
frequently find useful for fine mapping. 

Indeed, the problem of refined mapping of a disease 
locus via linkage disequilibrium is not just of theoreti- 
cal interest. It has proved valuable in some notable 
instances. In the most celebrated case, Lhe cystic librtv 
sis gene was mapped using a combination of molecular 
and population genetic techniques, including linkage 
disequilibrium mapping (Kerem et aL, 1989; Rommens 
et aL, 1989; Kiordan et aL, 1989). Ozelius et aL 
(1992a,b) and Risch et at. (.1991. 1995) have recently 
narrowed the location of the torsion dystonia gene to 
a small region of chromosome 9 (9q34) using linkage 
disequilibrium mapping in the Ashkenazi Jewish popu- 
lation. Linkage disequilibrium mapping has also been 
employed to localize the gene for Friedreich Ataxia us- 
ing French Canadian, Italian, and Louisiana Acadian 
populations (l ujita el aL, 1990; Hanauer et aL, 1990; 
Richter^aJL, 1990; Pandolfo et aL, 1990; Sirngo et aL f 
1992), myotonic dystrophy using Caucasian popula- 
tions (ilaiiey etaL, 1991; Tsilfidis etaL, 1991), Lubag's 
disease using a Philippine population (Graeber et aL, 
1992; Wilhelmsen et aL, 1992), diastrophic dysplasia 
(Hastbacka et aL, 1992, 1994), and infantile neuronal 
ceroid lipofuscinosis (Hells ten et aL, 1993) using a 
Finnish population, Huntington disease using Cauca- 
sian populations (Huntington Disease Collaborative 
Research Group, 1993), Wilson disease using various 
populations, including Caucasians (Petrukhin et al., 
1993; Bowcock et aL, 1994), and polycystic kidney dis- 
ease using a Scottish population (Snarcy et aL, 1994). 
For marker loci, Jorde etaL (1994) found that linkage 
disequilibrium was an excellent predictor of physical 
distance in the adenomatous polyposis coli region of 
chromosome 5 usine a Caucasian population (see also 
Daiger 1989fjorde et aL, 1993). 

There are, however, reasons to be cautious about the 
use of linkage disequilibrium for fine mapping. Weir 
(1989) and Ilill and Weir (1994) have been pessimistic 
about this technique because linkage disequilibrium is 
inllucnced by other phenomena besides recombination, 
namely mutation, drift, breeding system, and selection 
(Nei, 1987). These population genetic phenomena can 
mask the impact of recombination, leading at the least 
to a large variance in the disequilibrium values among 
loci (Weil, 1989; llill and Weir, 1994). At worst, it could 
result in no relationship or even a misleading relation- 
ship between physical distance and linkage disequilib- 
rium (T,itt and Jorde, 1986; Thompson etaL, 1988; Wal- 
ter aud Cox, 1991). 

Tn addition, recombinant mapping or linkage analy- 
sis is fundamentally different from simple disequilib- 



rium mapping. Recombinant mapping places specific 
bounds on the locaLion of the disease gene, whereas 
simple disequilibrium mapping can indicate only the 
likely location of the gene. The precision of this likely 
location depends on evolutionary phenomena, as well 
as the locations of the marker loci relative to the dis- 
ease locus (derailed below). 

Clearly, if simple disequilibrium mapping is to be 
useful, optimal strategies must be employed. One fea- 
ture of the analysis that has not received much atten- 
tion is the measure of disequilibrium. Numerous mea- 
sures of linkage disequilibrium have been devised over 
the past 60 years of population genetic research, none 
of which has been shown to be optimal for simple dis- 
equilibrium mapping. Various measures have been 
used, and when two measures were compared (Jorde 
et aL, 1994), the conclusion was that they differed very 
little. 

In this report, we discuss the line-mapping proper- 
tics of five commonly used measures of linkage disequi- 
librium. We first elaborate the relationships between 
these measures of disequilibrium and their relation- 
ships to other standard statistical quantities. We then 
show, via simple deterministic examples, analytic 
methods, and stochastic simulations, that the choice of 
linkage disequilibrium measure can have a substantial 
impact on the accuracy and interpretability of the sim- 
ple disequilibrium mapping method. In what follows 
we restrict our discussion to marker loci having two 
alleles and a disease locus having two alleles, a "dis- 
ease" and a "normal" allele. Thus the haplotypcs for 
the disease locus and any single marker locus can be 
arrayed in a 2 X 2 table. F.ven if the marker has more 
than two alleles, the association is usually with only 
one (e.g., under complete disequilibrium), so marker 
alleles can be classified into two classes. The assump- 
tion of a single mutation at the disease locus is a far 
more important assumption. 

MEASURES OF r .INK AGE DfSEQUTT.fRRTITM 

Ilediick (1987) has reviewed the numerous measures 
of linkage disequilibrium. In his review, Iledrick dem- 
onstrates the conditions under which the measures, or 
at least a subset thereof, are highly correlated. 

Consider two loci, each locus having two alleles: a 
disease allele and a normal allele segregate at the first 
locus, and two marker alleles segregate at the other 
locus. The layout and notation of the 2x2 table from 
a sample from the population are given in Table 1 . 

in Table 1, /t u is the number of haplotypes in the 
sample carrying the disease allele and marker allele 
Al, rti , is the number of haplotypes bearing the Al 
allele, n-i is the number of haplotypes bearing the dis- 
ease allele, and n is the total number of haplotypes 
sampled. Dividing these quantities by n yields the fre- 
quencies and marginal probabilities (denoted by p) 
from the sample (Table 2). 

Conditional probabilities arc written similarly to the 
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TABLE 1 



Layout and Notation fur Sample Haplotype 
Frequencies in a 2 X 2 Table 



Marker 


Disease allele 


Normal allele 




Al 




'l\2 




A2 












«+2 





unconditional probabilities in Tabic 2. For instance, 
the probability of having allele Al in the haplotype, 
given that the disease allele is present, is denoted p { +1 
= p„/p- } . Likewise, the probability of having the nor- 
mal allele in the haplotype, given that the marker allele 
is A2, is given by p 2 2 - — P^Pi+ • 

Naturally the p's are only sample estimates of some 
underlying unknown parameters, denoted by 7r"s. We 
use tt's in the definitions that follow, with the under- 
standing that these unknown quantities are estimated 
from the observed sample quantities. 

'J 'he basic component of many measures of disequilib- 
rium is the difference between the observed and the 
expected (under independence) number of haplolypcs 
bearing the disease allele and the Al allele or its equiv- 
alent expressions: 

D = 7T U — 7T 1 + 7T +1 = 7T 22 — ^2+ K +2 

= — 7T l2 + TTi 7T l2 = — ~21 + ^2 TTil 

According to Hill and Weir (1994), the most frequently 
used measure of disequilibrium is the square of the 
standardized measure 



(vti vr^,7r lt 7r ia ) 1/2 

or A 2 . A is commonly squared to remove the arbitrary 
sign introduced when the marker alleles are labeled. 

Another common measure, introduced by Lewontin 
(1964), is defined as 



Z>' = 



7r ll 7I *22 




min(vii 






— v;i2"2i 


minC^! ,7r 


1> 2^2 « ) 



The quantity in the denominator is the absolute maxi- 
mum Othal could be achieved given Lhc table margins. 
These measures are related to standard statistical 



measures of association. In particular, A is the con-ela- 
tion coefficient Tor a 2 X 2 table (Hill and Robertson, 
1968). A is also proportional to Ilaberman's (1973) ad- 
justed residuals for the 2x2 table 

. ny ~ "hi 

where m y is the expected number in cell ij. 

Another association measure that finds frequent use 
in epidemiology and has also been used to study linkage 
disequilibrium in Levin's (1953) population attribut- 
able risk 5*. This quantity is defined as 

77 l t 

7T t -(</> - 1) 
1 + " 1) ' 

where = {^li/^i }/{» 21/^2. K the relative risk. An ap- 
proximation for this measure of association or disequi- 
librium was first used in the population genetics con- 
text by Rengtsson and Thomson ( I 98 1 ) (see also Thom- 
son, 1981). Specifically, by appending to the odds ralio 
approximation to the relative risk (e.g., Breslow and 
Day ? 1980), one obtains after some algebra an approxi- 
mation for the population attributable risk that is ro- 
bust to sampling disease haplotypes at a higher rate 
than their population frequencies (i.e., case-con tml 
sampling) 

7T 2 +2 ^-1^22 

(Levin and Bertell. 1978). Subsequent to Bengtsson 
and Thompson's (1981) research on IELA associations. 
6 has been used by O/elius et ah (1992a,b) and Riseh 
etal. (1991, 1995) for simple disequilibrium mapping. 
Most recently it has been rederived and used for dis- 
equilibrium mapping by T,chcsjoki et al. (1993), who 
referred to it as Z^^, and by Terwilliger (1995), who 
referred to it as \; however, these measures are 
simply 6. 

The measure £* is not entirely new to population 
genetics. In fact, when the disease is raxe and haplo- 
types are sampled at random, 6 = 6* = L>': 

TABIvK 2 



Notation for Estimated Haplotype, Marker Allele, 
and Disease Allele Frequencies in a 2 X 2 Table 



Murker 


Disease allele 


Normul allele 




Al 


Pit 


Pl2 




A2 












P+2 


1 
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<5* = — (TTl 1-h - TTi 2 _) 

' j 
7r_i7T2- 

after some algebra. BuL 

ry - Hiilg ~ vT i2 ,7r 2i 
min(7r 1+ 7r_ 2 . 7r_i7r 2 .,.) 

7T +1 7T 2 _ 

when the table is structured so that D is positive and 
the disease is rare relative to the associated marker 
allele frequency (see also Thomson, 1981). The denomi- 
nator of jy is typically 7r for rare diseases and 
random sampling because it is the minimum if tt 12 — 
~2i = fi"n — TTii > 0. This condition is met whenever 
the associated marker allele is more common in the 
population than the disease allele. Nouee that 6* = A>' 
and 6 differ only in their denominators, tt^ i?r 2 _ versus 
/T_i7r 22 . For a rare disease, tt 21 is small, so 7r 2+ . = 7r 21 + 
tt 22 = 7T 2 2. However, iy ^ 6 under case-control sam- 
pling. 

Another epidemiologic measure (Nei and Li, 1980), 
which was specifically recommended for disequilibrium 
mapping when case-control sampling is employed 
(Kaplan and Weir, 1992) is the difference in propor- 
tions d 

^ _ "11 7T 12 _ ^"11^22 — 7Tl2^21 
K+i 7T_2 7T +1 7T_ 2 

Other natural epidemiologic measures, again robust to 
case-control sampling, have found some use in popula- 
tion genetics, spccilically the odds ratio \ and Yule's 
(1900) Q (e.g., Clcgg et aL, 1976; Nci and Li, 1980; 
Olson and Wijsman, 1994). Recall that 

^ _ ^11^22 

and therefore ranges from zero to infinity, while 

X — 1 _ ^"l 1^*22 ~ ^12^21 

ranges between negative one and one. The last expres- 
sion for Q shows its relationship to 8. In fact, the nu- 
merators of A, £>', 6. d y and Q are all equal to L>. and 



TABLK 3 

Disequilibrium Measures Commonly Used 
for Fine-Scale Mapping 



Symbol Junnula 





^"11^22 ~~ 7r l2 7r 2l 




(r, Wj l * 11 :r ia ) W 


U 
















d 


» 11*22 ~~ ^12.^21 






Q 


7C n 7T>> — XviXzt 




~11^22 + TtlzXzi 



Nnte. 'I'hc notation, with t'r suhstitii ted tor />*R. is defined in Table 
1. Note lhat the numerators of the measures are identical but Lhe 
denominators, which standardize the measures, are not. The formula 
for /> is a special case discussed in the text. 



these measures differ only in their denominators, 
which serve to standardize £> (Table 3). 

In what follows, we focus on five measures of disequi- 
librium (or association): A, 0\ 6, d, and Q. One might 
conjecture that they all yield equivalent information for 
simple disequilibrium mapping. However, we ill us Urate 
by some deterministic examples, analytic results, and 
evolutionary simulations that this is not the case. In 
our examples we assume that the two hapiotypes for 
each individual can be determined (as, for example, 
for a recessive disease or for multiplex families with a 
dominant disease). However, our conclusions also 
apply to the more general situation. 

THE PERFORMANCE OF LINKAGE DISEQUILIBRIUM 
MEASURES FOR SIMPLE FINE MAPPING 

Deterministic KJalculaiians 

Predicted patterns in populations. Tmaginc there 
are 50 founders of a new population. One individual 
carries a dominant disease allele 0 at a locus of inter- 
est. On the chromosome bearing the disease locus, lei 
there be three bi allelic markers on each side of the 
disease locus, and one biallelic marker at the disease 
locus itself (Mo)- The two markers adjacent to the dis- 
ease locus are equidistant from it (denote diem M t and 
Mr), the next furthest pair are also equidistant from 
the disease locus (M 2 and Mr> keeping the "primes'' on 
the same side), and likewise for the furthest markers 
{M s and My). Because the pairs arc equidistant from 
the disease locus, the recombination rates between dis- 
ease locus and marker are assumed to be equal; define 
them to be 0, = 9 V = 0.002, 9 2 = 9 r = 0.007, and 0> = 
By = 0.012. In the population, let the allele frequency 
vectors for these seven markers, from M K to My, be 
(0.25, 0.75), (0.5, 0.5), (0.25. 0.75), (0.5, 0.5), (0.5. 0.5), 
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«).25 f 0.75), (0.5, 0.5) and let Ihc lirsl value in eaeh 
Luple eorrespond Lo ihc frequency or the marker allele 
carried on the chromosome bearing the disease allele. 
Assume thai the allele frequencies of the markers 
and the disease locus remain relatively stable from gen- 
eration to generation. However, alter the initial ap- 
pearance of the disease allele, recombination erodes 
the disequilibrium between the disease allele and the 
marker alleles. At founding, or generation / = 0, the 
joint frequency of die disease allele and the "leading" 
marker allele (i.e., the lirst allele at each locus) is 0.01- 
When we sample this population at t — 100 genera- 
tions, the expected frequencies of disease chromosomes 
bearing the original marker alleles are. from A/ 3 to 
M yt 0.0047, 0.0075, 0.0086, 0.01, 0.0091, 0.0062, and 
0-0065. 

Notice that the pairs of equidistant markers do not 
have equal joint frequencies of disease allele and 
marker alleles even though their recombination rales 
are the same and, initially, the joint frequencies arc 
equal This is because, at a given locus, recombination 
need not generate a new haplotype. The rate at which 
recombination generates new haplotypes depends on 
marker allele frequencies, which are not equal for the 
equidistant markers even in the founding generation. 
Consider a marker locus M f aL generation t = 0. The 
joint frequency of disease allele D and marker allele 1 
is given by 7Tm =0> = + A-o. CD is the disequilib- 

rium measure defined above.) The disequilibrium for 
the 100th generation is A-ioo = (1 - 0,-) loo Q-o. The 
joint frequency at t = 100 is then MT 100 * = *i ■ n* i + 
M-ioo- All joint frequencies for our example can be cal- 
culated in this fashion. 

These differences between frequencies, for both 
marker alleles and joint frequencies of disease and 
marker alleles, are important for fine mapping. As 
Hedriek (1987) emphasized, measures of disequilib- 
rium such as A can be difficult to interpret when loci 
differ in their allele frequencies. Other measures, such 
as L> f and 6, are more easily interpreted, Furthermore, 
the ability to determine correctly the location of the 
disease locus from the pattern of disequilibrium values 
depends on the measure used. For example, consider 
the disequilibrium values from our population at gener- 
ation / = 100 (I'ig. 1, top). The maximum for A and d 
are not at the disease locus, but at an adjacent marker. 
(To make the measures comparable in Fig. 1, A and d 
have been rescaled by multiplying die set of values for 
each measure and scenario by a constant.) In addition, 
these disequilibrium measures yield multimodal pat- 
terns of disequilibrium. By contrast, b and L>* exhibit 
almost identical behavior: they are uni modal and es- 
sentially symmetric and their maximum is at the dis- 
ease locus. Finally, Q has a maximum at the appro- 
priate location, but it shows marked deviation from 
symmetry. 

if we examine the population in an early generation, 
say (r = 5), the results would be even more dramatic 




-0.01 0 -0.005 0.0 0.005 0.010 



distance (Cm) 




-0.010 -0.005 0.0 0.005 0.010 



distance (Cm) 

FIO. 1. .Linkage disequ ilibriu in ve rsu s recombination fraction for 
five disequilibrium measures: 1 = Z>\ 2 = 5, 3 = A, 4 = Q. and 5 — 
d. The pallerns displayed are genera led by a model population (see 
text for details). Both D' and b display an ideal pattern (overlapping 
solid lines) for simple disequilibrium mapping. 

(Fig. 1 ? bottom). In this case, it would be difficult to 
define even a region to search for the disease locus if 
the researcher uses A or d as the measure of disequilib- 
rium. D' nnd b place the disease locus in the appro- 
priate location, although The peak itself is little differ- 
entiated Irom other locations. £>shows behavior similar 
to thai of r>* and 6 in that Us maximum is the same; 
however, Q has another peak at the extreme left 
marker and it also has other asymmetries such as the 
"right shoulder." From these examples., it should be 
clear that the choice of disequilibrium measure can be 
important for simple disequilibrium mapping. 

To see why this is so, recall the expression for dis- 
equilibrium at generation n, D n — (1 — 0)"ZV Ideally 
we desire a measure that is a function of 0 only, for 
instance 

(i - oy i = — - 7ril7r22 ~ 71-1271,21 

Our rationale for the denominator of this expression is 
as follows. Under the assumption of initial complete 
linkage disequilibrium nnd no change of disease allele 
frequency over time, tt, , 4- tt 21 = tt+i is the best estimate 
in generation n of the initial disease allele frequency 
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and hence 7r lt : Tr^t al generation 0 is 0. Thus tt-^jz is 
Lhc besL csLimale of fX, the iniLial amount of linkage 
disequilibrium. Reexpressing the measures of linkage 
disequilibrium in terms of (1 — 0) n is revealing. As 
shown above, (1 - Q) n is exactly 8. 
tor 0\ when L> > 0 and vtj , > tt x ? 

(i - or = iy(^\ + ^ . 

Thus, the relationship between O f and 0 depends some- 
what on haplorype and marker allele frequencies. For 
rare diseases, however, vt 21 <^ tt 22 usually, making />' 
essentially only a function of 0. An exception occurs 
when one of the marginal marker allele frequencies is 
rare: The effect of rare 7r 2 , can he seen in the expression 
above; the effect of rare tt 1+ is to change the denomina- 
tor of jy . In fact, when 7r v - < ?r +l , 

(i - * r = + 7r - /yr " )(1 + * M ) . 

V (I + 7T 21 /7r t1 ) J 

making the relationship between A>' and in this ex- 
ceptional case dependent on haplotype and marker al- 
lele frequencies. 

The relationship between Q and 0, 

\ 7r„/7r 21 + 1/ 

also reveals a dependence on haplotype frequencies. 
The coefficient of Q potentially ranges between (0, «0, 
although extreme values occur only when the unassoci- 
ated marker allele frequency is small. 

The relationship between d and 9 can be deduced 
from the relationship between 6 and 0 because d — iz^Jbl 
tt_ 2 and therefore 

Thus, d depends on haplotype and marker allele fre- 
quencies. 

Finally, it is apparent that the relationship between 
A and 0 is obscured by marginal allele frequencies: 



(i - or = a —J— (i - TTi-xi - • 

When all live measures are compared, it becomes 
apparent that 8 is the measure most directly related 
to 0. Furthermore, reexpressing 6 in terms of haplotype 
frequencies, 

g _ 7r u /^2i ~ 1*12/1122 



TABLE 4 

Expected Disequilibrium after 100 Generations of 
Random Hecomhlnation between Marker and Disease 
Loci for Five Measures <>r Disequilibrium, as a Func- 
tion of Associated Marker Allele Frequency and Re- 
combination Fraction 0 



Marker allele frequency 
Measure. 0 0.167 0.333 0.5 0.667 0.833 



iy 

Q 
d 



Note. The disease and marker loci were- initially in complete dis- 
equilibrium, with a disease allele frequency of 0.01. 

shows that it depends only on the relative frequencies 
/Tn/7r 2l and 7r l2 /7r 22 and not on the marginal marker al- 
lele frequencies (see also Thomson, 1981). Thus it is 
the ideal measure for simple disequilibrium mapping. 

Other measures, such as 0\ are also proportional to 
0, at least under certain circumstances. To illustrate 
the behavior of the five measures, we first, use deter- 
ministic calculations similar to out first example. All 
calculations are based on complete disequilibrium be- 
tween a disease allele (occurring with frequency 0.01) 
and marker allele at generation 0, which breaks down 
over 100 generations by recombination alone. The re- 
sults (Table 4) at generation 100 reveal the sensitivity 
of A, £>, and d to haplorype and marker allele frequency 
variation. Low-frequency alleles far from the disease 
locus give higher values of A. d. and Q than closer, 
high-frequency alleles. 8 and £>' are insensitive to such 
variation. 

Thus, from deterministic calculations, it appears 
that either 6 or L>' is ideal for simple disequilibrium 
mapping. However, the attributes of D' depend 
strongly on its denominator, which, in turn, depends 
on marker allele and disease allele frequencies. Z>' is 
usually directly related to 0; however, Tor common dis- 
ease alleles, or for case-control sampling, D' need not 
be directly related to 9. 

Effec t of case-coniroi sampling. A common strategy 
in linkage disequilibrium studies is to sample higher 
proportions of diseased individuals relative to their 
population frequencies, as in a case-control study. If the 
study, by design, samples disease chromosomes with 
probability ttJ, as compared to 7r_, . the value c = 



0.003 
0.006 
0.009 
0.003 
0.006 
0.009 
0.003 
0.006 
0.009 
0.003 
0.006 
0.009 
0.003 
O.006 
0.009 



0.742 
0.550 
0.407 
0.740 
0.548 
0.105 
0.900 
0.790 
0.677 
O.J66 
0.123 
0.091 
0.623 
0.461 
0.340 



0.742 
0.550 
0.407 
0.740 
0.548 
0.105 
0.815 
0.650 
0.600 
0.105 
0.078 
0.058 
0.499 
0.367 
0.273 



0.742 
0.550 
0.407 
0.740 
0.548 
0.-1 05 
0.744 
0.52] 
0.408 
0.074 
0.055 
0.041 
0.374 
0.277 
0.205 



0.742 
0.550 
0.407 
0.740 
0.548 
0.105 
0.684 
0.479 
0.340 
O.053 
0.039 
0.029 
0.249 
0.184 
0.136 



0.742 
0.550 
0.407 
0.740 
0.548 
0.105 
0.634 
0.424 
0.292 
0.033 
0.025 
0.018 
0.125 
0.092 
0.068 
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Haplotype, Marker, and Disease Frequencies for 
c-fold Increased Sampling of Disease Ilaplotypes 



Marker 


Disease allele 


Normal allele 




Al 




-it 2(1 - '^n) 


+ 7T 12 








7T 2 


A2 






7T 22 ~ CO 








7T 2 




C7T +l 


1 - 


L 



7r*iAT, ! is a convenient means of expressing the effect 
of such case-control sampling on the relative frequen- 
cies in a 2 X 2 I able (Table 5). When wc discuss case- 
control sampling, wc take c = 50 and Tr+t = 0.01. This 
sampling yields equal numbers of disease and normal 
chromosomes, analogous to the typical strategy for 
case-control studies. 

Whereas haplotype frequencies change, 6 and Q are 
unaffected by case-control sampling. This invariance 
follows formally from the fact that both measures are 
functions ofLhe odds ratio; which is invariant to case- 
control sampling (Edwards, 1963). Likewise, d is also 
unaffected by case-control sampling, as can be seen by 
substituting the adjusted haplotype frequencies (Table 
5) into the equations for d. 

The other two measures are affected by case-control 
sampling in some way. We examined the effect of case- 
control sampling on the pattern of disequilibrium 
across marker loci by supposing that a grid of markers 
surround the disease locus and the marker allele fre- 
quencies vary systematically between 0.083 and 0.9 17. 
Other attributes are identical to those used to devclcTp 
Table 4, except that c = 50 (Table 5), so that equal 
numbers of disease and normal haplolypes are ob- 
served. 

Kor /y, the pattern is frequently multimodal and, for 
small «0.1) marker frequencies, the maximum dis- 
equilibrium need not occur at the proximate marker 
locus (data not shown). These results arc quite differ- 
ent from our results for random sampling, in which the 
pattern was always unimodal with a maximum at the 
proximate locus. The impact of case-control sampling 
on L> f is mediated, in large part, through the choice of 
denominator. (Recall that die relationship between £>' 
and 0 depends critically on which of two terms is the 
minimum.) For case-control sampling, the denominator 
of £>' is the 

[ TT 22 ~ CD CD 4- 7T 12 ] 
lUin C7T-! , (1 - CTT+i) " > , 

and the first expression is the minimum if cz x2 — tt^ i2 
(J — cxi+) > 0. This expression, which parallels that 
found for random sampling, can be greater than zero 



only when the sampled disease haplotype frequency is 
less than the associated marker allele frequency. 

The results using A as the measure of disequilibrium 
differ markedly from those using Z>' (data not shown). 
In this instance, the multimodality of the pattern 
changes very little from tha t obtained from the popula- 
tion, although case-control sampling leads to an in- 
crease in number of limes the proximate marker locus 
shows maximum disequilibrium. With some algebra, it 
can be shown that this increase occurs because case- 
control sampling changes the relationship between dis- 
equilibrium values at different loci relative to the val- 
ues obtained from random sampling. 

Impact of Stochastic Factois 

We examined the impact of evolutionary forces by 
simulation of short-term population evolution. Details 
of the simulations are given in the Appendix. In brief, 
each population initially consisted of 2000 chromo- 
somes (Le., 1000 individuals), which then grew over 
100 generations to a size of 100.000 chTornosorncs. Pop- 
ulation expansion occurred at a constant exponential 
rate. Recombination occurred al random, as did repro- 
duction. No mutation occurred. 

To examine systematically the impact of variation in 
marker allele frequencies, we simulated populations of 
chromosomes having tliree marker loci, at distances 0 
- 0.001, 0.004. 0.007 from the disease locus. Initial 
marker allele frequencies were either of three values, 
0.1 , 0.5, 0.9, and all possible combinations of those val- 
ues for different loci were examined (i.e., 27 sets). For 
each combination of marker allele frequencies, 80 popu- 
lations were simulated. The initial disease allele fre- 
quency was set to 0.01; if the frequency of this allele 
dropped below 0.005 during any generation, the simu- 
lation was reinitialized at generation zero. Marker al- 
lele frequencies were allowed to go to zero (raTcly oc- 
curred), in which case the locus was ignored as it was 
not polymorphic. 

Two types of data were examined from each popula- 
tion: the disequilibrium pattern for the population as 
a whole (population pattern) and the disequilibrium 
pattern for case -control sampling (with c = 50. in expec- 
tation); specifically, 200 disease chromosomes and 200 
normal chromosomes were sampled. We recorded, for 
each set of allele frequencies, the fraction of the time 
the nearest marker exhibited the greatest disequilib- 
rium and the mean square error (MSH)» computed as 
the sum of the squared recombinant onal distance be- 
tween the disease locus and the marker exhibiting 
maximum disequDibrinm between it and the disease 
locus. Ideally the MSB would be (0.00 if = IE - 6, 
which would occur if the nearest locus always exhibited 
maximum disequilibrium. MSFis an appropriate mea- 
sure of variability in this instance because it naturally 
incorporates both variance and any bias into a single 
statistic. 

The simulation results agree with the deterministic 
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calculations in terms of the average performance over 
all sets of allele frequencies. For the population paL- 
rern, the nearest marker locus exhibited the greatest 
disequilibrium the highest fraction of times with both 
6 and D' (83.5%), followed by G(81.2%), then d (52.9%), 
and finally A (48.3 c /c). MSt shows the identical pat- 
tern, with both 6 and D' having the smallest MSE 
(4.94E-6), followed by Q (5.17E-6), then d (1.38E-5), 
and finally A (1.55R-5). Taking the square root of the 
MSE. we have 0.0022, 0.0023, 0,0037, and 0.0039, re- 
spectively. (Recall that MSE emphasizes occasional 
larger deviations relative to a measure such as the av- 
erage absoluLo deviation.) 

For case-control sampling, 6 outperforms all other 
measures in terms of the pattern of maximum disequi- 
librium and MSE (81.27c and 5.39E-6), followed by Z>' 
(78.1% and 6.55E-6). Q(76.6 ( /c and 6.79JB-6), A (55.1% 
and 1.29E-5). and d (52.8% and 1.39E-5). Taking the 
square root of the MSE yields 0.0023, 0.0026, 0.0026. 
0.0036, and 0.0037, respectively. Note that, as pre- 
dicted by the deterministic calculations, the perfor- 
mance ol'<5 and T>' are now distinct and thai the perfor- 
mance of A improves with case-control sampling, rela- 
tive to the population patterns. Moreover, because we 
sampled a relatively large number of haplotypes (200 
disease, 200 normal), the impact of sampling error per 
se is small. Naturally smaller sample sizes will in- 
crease the MSE of simple disequilibrium mapping. 

Substantive patterns are hidden by these data sum- 
maries. As demonstrated by the deterministic calcula- 
tions, the poor performance of A and d is due to a bias 
involving the magnitude of allele frequencies. Large 
disequilibrium values are associated with small allele 
frequencies and vice versa; thus, both measures, when 
used for simple disequilibrium mapping, frequently 
cause it to be an inconsistent estimator of the marker 
nearest the disease locus (i.e., the estimator does not 
converge to the true answer as the sample size tends 
to infinity). However, this bias could also fortuitously 
work in the investigator's favor, l or instance, when the 
nearest marker's associated allele frequency is small, 
and other associated marker allele frequencies are 
much larger, the proximate marker will almost invari- 
ably show a large disequilibrium value using either 
A or d. For the simulations, when associated allele 
frequencies for furthest to nearest markers were ini- 
tially set to 0.5, 0.9, 0.1, the largest disequilibrium 
value occurred at the proximate marker 1(X)% of the 
time (population level). Alternatively, foT the configu- 
ration 0.9. 0.1, 0.9, the largest disequilibrium value 
never occurred at the proximate marker for either mea- 
sure. The bias is illustrated in Table 6, which presents 
the results for case-control sampling only. 

Q shows behavior similar to that of A and d. As the 
deterministic calculations suggest, however, it is less 
sensitive to the magnitude of marker allele frequencies 
(Table 6). The behavior of 6 and D* in the stochastic 
simulations deviate somewhat from the deterministic 
calculations. The deterministic calculations suggest 



that both measures should be unaffected by the magni- 
tude of marker allele frequencies, whereas the simula- 
tions clearly show tha t the performance of these mea- 
sures for simple disequilibrium mapping also changes 
with the configuration of allele frequencies (Table 6). 
These measures are most affected when the frequency 
of an associated marker allele is large. For instance, 
when the associated allele frequency configuration was 
0.9, 0.9, 0.1 for furthest to nearest markers, the largest 
disequilibrium value occurred for the proximate 
marker only 62.5% of the time (population patterns) 
and the MSE was 9.1E-6. Conversely, when the associ- 
ated a I lele frequency con figuration was 0- 1 , 0. 1 , 0.1 , the 
largest disequihbrium value occurred for the proximate 
marker 96.25% of the time (population pattern) and 
the MSE was 2.0E-6. 

Most of this behavior is attributable to the variance 
in 6 and D' . Because these measures are essentially 
identical under many circumstances, we discuss only 
6. The asymptotic standard error for log(l — 6) is 

( gy» + ^12 \ 1/2 

(Walter, 1975), and therefore the asymptotic standard 
error of 8 increases as the unassociated marker allele 
frequency, tt j+ , tends toward zero. While we are less 
interested in statistical sampling than in genetic sam- 
pling (sensu Weir, 1990), genetic sampling can be 
thought of as repeated statistical sampling. Therefore, 
the sensitivity of 6 (to the unassociated allele frequency, 
as revealed by the formula for its asymptotic standard 
error variance formula, is pertinent. 

In another set of simulations, we allowed initial 
marker allele frequencies to vary at random between 
the limits 0.15 and 0.85 and specified seven marker 
loci with recombination, relative to the disease locus of 
0.009, 0.006, 0.(K)3. (X. 0.003, 0.006, 0.009. Other simu- 
lation conditions were the same as those described pre- 
viously, except that 200 populations were simulated. 
In this case, the largest disequilibrium value for 6*. />', 
and Q always occurred at the disease locus for both the 
population and rhe case-control sampling scenarios. 
For A, the largest disequilibrium value occurred with 
the proximate marker 44% of the lime for the popula- 
tion and 54.5% of the time for case-control sampling. 
For d, the largest disequilibrium value occurred with 
the proximate marker 47% of the time for the popula- 
tion and 48% of the time for case-control sampling. 

DrscussroN 

At the instant a new "disease" mutation occurs, the 
disease allele is associated with alleles at other poly- 
morphic loci in the region. Tn particular, the disease 
locus is in complete linkage disequilibrium (Clegg et 
al. s 1976) with other loci in the region. When it is rea- 
sonable to assume that the disease locus was initially 
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TABLE 6 

Simulation Results of Short-Term Evolution and Subsequent Case-Conlrol Sampling by Initial Marker Allele 

Frequency (Furthest to Nearest Marker from Left to Right) 

Disequilibrium measures 

Allele frequency Z> b A Q d 



0.1 


0.1 


0.1 


0.90 (3.33) 


0.96 (1 .98) 


O.Vf) ( 1 . fZi) 


V I.V. i \*-. J Of 


r\ n< (-1 "7<i 
W.V.J (, 1 . r. J J 


0.1 


0.1 


0.5 


0.68 (9.63) 


0.95 (1.79) 


0.18 (15.91) 


U.jj (V.ojJ 




0.1 


0.1 


0.9 


0.78 (8..->8) 


0.85 (.■>. / 1) 


O.OO (20.56) 


(V.oh) 


U.Ut) I J n.y 1 J 


0.1 


0.5 


0.1 


0.98 (3.63) 


0.98 (1.41) 


1.(J (l.(X)) 


U.99 (1 . / /) 


I ,\) \ 1 .\At) 


'0.1 


0.5 


0.5 


0.©3 (4.&4) 


0.91 (2.37) 


L/.44 (26. 4' J) 






0.1 


0.5 


0.9 


0.94 (3.82) 


0.95 (18 L) 


0.05 (40.05) 


0.84 (7.50) 




0.1 


0.9 


0.1 


0.68 (.7.22) 


0.68 (5.93) 


0.99 (2.06) 


0. /o (4.55) 


U.9V (^-.UvJ 


0.1 


0.9 


0.5 


0.60 (10.48) 


0.66 (6.63) 


0.51 (24.12) 


CP.61 (1U.DJ) 


U.*+J (— f.jZ) 


0.1 


0-9 


0.9 


0.81 (5.31) 


0.85 (3.84) 


(J.U5 (4o.()r)J 


4 1 T ^ /'if TIN 
U. / . 1 (iV. / 1 ) 


l\ t\1 fill 


0.5 


0.1 


0.1 


0.83 (4.60) 


0.90 (5.08) 


0.93 (3.69) 


0.91 (2.94) 


U.95 (^-. f J) 


0.5 


0.1 


0.5 


0.85 (3.34) 


0.96 (1.62) 


ft ft | / t ft ftC\ 

0.21 (12.95) 






0.5 


0.1 


0.9 


n k"^ 7i\ 


/) /J OA* 

VJ.OO ^*T..7»IPJ 


0(U (17 IMtt 

\/»\>*T | t /.IJWJ 


0.75 (4.90) 


0.01 (18.25) 


0.5 


0.5 


0.1 


0.96 (1.59) 


0.94 (2.03) 


0.99 (1.19) 


0.99(1.19) 


0.99 fi.19) 


0.5 


0.5 


0.5 


0.93 (3.40) 


0.93 (3.42) 


0.83 (5.74) 


0.91 (3.58) 


0.79 (6.71) 


0.5 


0.5 


0.9 


0.90 (3.87) 


0.90 (3.87) 


0.09 (21.47) 


0.86 (4.03) 


0.03 (22.84) 


0.5 


0.9 


0.1 


0.60 (7.12) 


0.58 (7.49) 


0.99 (LI 9) 


0.68 (6.01) 


0.'X> (1.19) 


0.5 


0.9 


0.5 


0.54 (5.68) 


0.54 (6.10) 


0.91 (4.01> 


0.58 (7.52) 


0.90 (5.03) 


0.5 


0.9 


0.9 


0.80 (5.00) 


0.81 (4.83) 


0.20 (34.52) 


0.79 (5.58) 


0.05 (44.05) 


0.9 


0.1 


0.1 


0.74 (10.81) 


0.76(1 1.66) 


0.94 (2.29) 


0.86 (6.86) 


0.94 (2.43) 


0.9 


0.1 


0.5 


0.79 (7.79) 


0.86 (4.21) 


0.19 (13.26) 


0.66 (8.75) 


0.16 (13.67) 


0.9 


0.1 


0.9 


0.76 (9.28) 


0.81 (5.56) 


0.04 (15.81) 


0.66 (9.20) 


0.01 (16.16) 


0.9 


0.5 


0.1 


0.80 (9.59) 


0.75 (10.74) 


0.98 (1.59) 


0.88 (6.80) 


0.98 (J. 59) 


0.9 


0.5 


0.5 


0.78 (10.35) 


0.78 (10.39.) 


0.84 (3.91) 


0.78 (10.73) 


O.80 (4.47) 


0.9 


0.5 


0.9 


().85 (5.50) 


0.84 (5.69) 


().()5 (16.59) 


0.83 (5.48) 


0.01 (17.16) 


0.9 


0.9 


0.1 


0.61 (8.58) 


0.56(10.99) 


0.98 (1.62) 


0.68 (7.24) 


().98 (1.62) 


0.9 


0.9 


0.5 


0.55 (13.33) 


0.55 (10.51) 


0.98 (1.41) 


0.61 (11.54) 


1.0 (1.00) 


0.9 


0.9 


0.9 


0.80 (6.80) 


0.80(6.80) 


0.61 (11.80) 


0.80 (6.77) 


0.49 (16.15) 



Two statistics are presented: the fracliorj of limes oul of 80 Lhe nearest marker exhibited maximum disecjuilibrium and, in parentheses, 
the mean-square error times Kr\ Ideally, (Kr 3 ) MSK would equal 1.0 because the ^combinational distance between the disease locus and 
the nearest marker was .001. 



in complete disequilibrium with other nearby marker 
loci, our analyses suggest that 6, the robust version of 
the population attributable risk, is the best measure of 
disequilibrium for simple fine mapping. From deter- 
ministic calculations, it is clear that. 8 is directly related 
to 9, the recombination fraction. It is also most closely 
related to 0 for simulations of shorl-lerm evolution. 
Under a more limited set of circumsLanees, T.ewonLin*s 
jy yields results comparable to 6. The fact that the two 
measures behave so similarly, at least under random 
sampling, is hardly surprising because we have shown 
that the two are equivalent when the disease is uncom- 
mon and marker frequencies are relatively more com- 
mon in the population. An important caveat is that 
the measures are not equal when the study, by design, 
employs case-control sampling. 

Alternatively, A and d are useful only for simple 
disequilibrium mapping when marker allele frequen- 
cies vary very liule from locus to locus, a circums Lance 
unlikely to exist in general. Q is a better measure to 
use, at least relative to A and d. Nevertheless, like A 
and d* marker allele frequency variation across loci has 
a substantial impact on the pattern of disequilibrium 
values, especially when some marker allele frequencies 
arc small. 



Jorde eial. (1994) used A to examine the relationship 
between linkage disequilibrium and physical distance 
in the adenomatous polyposis coli region. In that study, 
Z>' was also examined but the results were not re- 
ported; nevertheless, they did report that the values 
for D f exhibited a pattern similar to those using A. It 
is imporlanl lo note, however, thai the similarity is 
mosL likely due Lo the striking similarity of allele fre- 
quencies at the dilTcrcnl marker ltx*i rather than to the 
inherent features of the measures. 

The short-term evoluUonary simulations that we 
performed make it clear that forces such as drift, as 
well as random recombination, influence the relation- 
ship between linkage disequilibrium and 0. As shown 
by Hill and Weir (1994) for steady- state populations, 
it is apparent that drift can obscure the predicted rela- 
tionship between recombination fraction and disequi- 
librium that is criUeal for simple disequilibrium map- 
ping. In Lhis regard, the MSH sLatisLics from the evolu- 
tionary simulations provide a ballpark estimate of the 
magnitude of error that could be incurred by using sim- 
ple disequilibrium mapping. However, as we have 
shown, the performance of simple disequilibrium map- 
ping is alTeeted by variation in marker allele frequen- 
cies and by the coniiguration of markers surrounding 
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the disease locus. Undoubtedly, ihc MSB is also nl- 
fectcd by the amount of lime since the inilia.1 disea.se 
mutaticm, mutations at marker loci, and so on. In fact, 
recurrent mutation at marker loci can have a tremen- 
dous impact on simple disequilibrium mapping because 
it mimics recombination. 

In this paper, we have focused on disease loci ha ving 
a single disease allele rather than disease loci with 
multiple alleles. Tn addition, wc have focused on dis- 
eases that are relatively uncommon in the population. 
lTom the theory, we see no obvious impact of a common 
disease on the performance of simple disequilibrium 
mapping using 6 because 8 should still be directly re- 
lated to 0 as long as there is only a single disease allele. 
Of course, the interpretation of £ as a robust approxi- 
mation to the population attributable risk is question- 
able because the approximation is poor under these 
circumstances. 

The presence of multiple disease alleles diminishes 
the strength of the relationship between the disease 
and the marker alleles. Suppose there are two or more 
disease alleles, with only proportion a of cases attribut- 
able to the primary disease allele. Then 

— q7r u+» 0 ~ fiQ^n+j ~ 07 1 1 +2 

so the association is reduced whenever a < 1 (see also 
T,chcsjoki etal. % 1993). Since a is constant across loci, 
6 still gives a consistent pattern of disequilibrium 
across loci, in theory reaching a maximum at the dis- 
ease locus. However, the smaller the value of or, the 
greater the impact of non systematic variation (such 
as drift), reducing localizing power. Nevertheless, our 
results suggest that an investigator attempting simple 
fine mapping will generally be most successful when 
using 6 (or possibly />') to describe linkage disequilib- 
rium. 

APPENDIX 

The evolutionary simulations were performed as fol- 
lows. First a population of 2000 chromosomes was cre- 
ated that had 20 chromosomes (p = 0.01) bearing the 
disease allele. The alleles at each marker locus on the 
20 disease chromosomes were all identical; thus, there 
was initially complete disequilibrium between disease 
and marker loci. For the remaining 1980 normal chro- 
mosomes;, marker alleles were assumed to be indepen- 
dent. The standard conditional independence model 
was therefore used to create the appropriate number 
of haplotypes of each possible ki nd, based on expected 
frequencies of two-locus haplotypes. (The expected fre- 
quencies of normal chromosomes are directly calculable 
from the marker locus allele frequencies and the obser- 



vation thai p n = 0.01 and p 2l = 0 under complete dis- 
equilibrium.) 

At each generation, the population grew at exponen- 
tial rate r = 1.0041607. To accomplish this growth, 
a pair of haplotypes was chosen at random from the 
population at generation t using a standard random 
number generator. The pair recombined randomly over 
any of Ihc ifiree inlxalocus intervals wilh probability 
equal to the recombination fractions between loci: 
0.001, 0.003, 0.003. Consequently there was no inter- 
ference between intervals. Ivvo haplotypes were pro- 
duced by this mechanism, usually the same as before, 
and then one of them was chosen at random to be a 
member of the / + I generation. This procedure was 
executed n r t times to produce the / + 1 generation. This 
method is not completely true to population evolution, 
in which haplotypes occur in pairs for each person, and 
only those pairs can recombine. Our procedure, how- 
ever, is essentially identical to the population process 
because recombination events arc rare, while the sim- 
plification allowed us to lower drastically the RAM 
needed to complete the simulations. 

Hp dropped below ().(X)5 at any generation, the simu- 
lation was reinitialised at t = 0 and run again. This rule 
kept the frequency of the disease allele from drifting 
to zero, especially in early generations; of course, a 
population without disease alleles would not be useful 
for disequilibrium mapping. On die other hand, marker 
allele frequencies were allowed to go to fixation. In this 
(rare) instance, the disequilibrium between the marker 
and the disease locus was set to zero. 

A few special circumstances should be noted: 6 
should always be positive, so allele labels were reversed 
whenever necessary, although this rarely occurred; the 
same action was performed on d\ and Q was set to —1 
or 1, whichever was appropriate, when one or more cell 
frequencies were zero. Because the sign of most of the 
measures is arbitrary, wc compared absolute values to 
determine the maximum disequilibrium value. 

At / = 100. the entire population was assayed for the 
patterns of linkage disequilibrium. Then a subsample 
of 200 disease haplotypes and 200 normal haplotypes 
were chosen at random, and this sample was analyzed 
for Lhe patterns ol linkage disequilibrium. Several sta- 
tistics were then recorded. These statistics are dis- 
cussed in the text. 
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